AITopics | tensorized transformer

Neural Information Processing Systems http://nips.cc/

multi-linear attention, tensor, transformer, (14 more...)

Neural Information Processing Systems

Country:

Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > China > Beijing > Beijing (0.04)
Asia > China > Tianjin Province > Tianjin (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.70)

Add feedback

A Tensorized Transformer for Language Modeling

Neural Information Processing SystemsFeb-6-2026, 14:22:44 GMT

Latest development of neural models has connected the encoder and decoder through a self-attention mechanism. In particular, Transformer, which is solely based on self-attention, has led to breakthroughs in Natural Language Processing (NLP) tasks. However, the multi-head attention mechanism, as a key component of Transformer, limits the effective deployment of the model to a resource-limited setting. In this paper, based on the ideas of tensor decomposition and parameters sharing, we propose a novel self-attention model (namely Multi-linear attention) with Block-Term Tensor Decomposition (BTD). We test and verify the proposed attention method on three language modeling tasks (i.e., PTB, WikiText-103 and One-billion) and a neural machine translation task (i.e., WMT-2016 English-German). Multi-linear attention can not only largely compress the model parameters but also obtain performance improvements, compared with a number of language modeling approaches, such as Transformer, Transformer-XL, and Transformer with tensor train decomposition.

artificial intelligence, chatbot, natural language, (10 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.98)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.91)

Add feedback

A Tensorized Transformer for Language Modeling

Xindian Ma, Peng Zhang, Shuai Zhang, Nan Duan, Yuexian Hou, Ming Zhou, Dawei Song

Neural Information Processing SystemsAug-20-2025, 05:39:42 GMT

Latest development of neural models has connected the encoder and decoder through a self-attention mechanism.

multi-linear attention, tensor, transformer, (14 more...)

Neural Information Processing Systems

Country:

Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > China > Beijing > Beijing (0.04)
Asia > China > Tianjin Province > Tianjin (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.70)

Add feedback

Reviews: A Tensorized Transformer for Language Modeling

Neural Information Processing SystemsJan-27-2025, 13:27:06 GMT

This code failed to compile, and had numerous confusing aspects, and the authors did not link to the actual code used in training the model. April 2019), but I could find no comparison with that work. However I would also like to see the total flops usage compared to the baseline, as flops are frequently the limiting factor for training and deployment of models.

language modeling, tattention, tensorized transformer, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.40)

Add feedback

Reviews: A Tensorized Transformer for Language Modeling

Neural Information Processing SystemsJan-27-2025, 13:26:56 GMT

The reviewers agree that the proposed model is well motivated and that the reduction in parameters achieved is significant. As such this paper is worthy of publication. However the reviewers also note a number of issues with the clarity of the presentation, general grammatical errors, and errors in the accompanying code. All of these issues must be addressed before publication. It is also required to add a more complete evaluation across a range of parameters scales for the tensorised model and the baseline, and to include the total flops used.

language modeling, publication, tensorized transformer

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.40)

Add feedback

A Tensorized Transformer for Language Modeling

Neural Information Processing SystemsOct-11-2024, 02:35:18 GMT

Latest development of neural models has connected the encoder and decoder through a self-attention mechanism. In particular, Transformer, which is solely based on self-attention, has led to breakthroughs in Natural Language Processing (NLP) tasks. However, the multi-head attention mechanism, as a key component of Transformer, limits the effective deployment of the model to a resource-limited setting. In this paper, based on the ideas of tensor decomposition and parameters sharing, we propose a novel self-attention model (namely Multi-linear attention) with Block-Term Tensor Decomposition (BTD). We test and verify the proposed attention method on three language modeling tasks (i.e., PTB, WikiText-103 and One-billion) and a neural machine translation task (i.e., WMT-2016 English-German).

language modeling, tensorized transformer, transformer, (3 more...)

Neural Information Processing Systems

Country: Asia > Myanmar > Tanintharyi Region > Dawei (0.09)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.71)

Add feedback

A Tensorized Transformer for Language Modeling

Ma, Xindian, Zhang, Peng, Zhang, Shuai, Duan, Nan, Hou, Yuexian, Zhou, Ming, Song, Dawei

Neural Information Processing SystemsMar-18-2020, 21:16:54 GMT

Latest development of neural models has connected the encoder and decoder through a self-attention mechanism. In particular, Transformer, which is solely based on self-attention, has led to breakthroughs in Natural Language Processing (NLP) tasks. However, the multi-head attention mechanism, as a key component of Transformer, limits the effective deployment of the model to a resource-limited setting. In this paper, based on the ideas of tensor decomposition and parameters sharing, we propose a novel self-attention model (namely Multi-linear attention) with Block-Term Tensor Decomposition (BTD). We test and verify the proposed attention method on three language modeling tasks (i.e., PTB, WikiText-103 and One-billion) and a neural machine translation task (i.e., WMT-2016 English-German). Multi-linear attention can not only largely compress the model parameters but also obtain performance improvements, compared with a number of language modeling approaches, such as Transformer, Transformer-XL, and Transformer with tensor train decomposition.

language modeling, tensorized transformer, transformer, (2 more...)

Neural Information Processing Systems

Technology: